Adaptation of Graph-Based Semi-Supervised Methods to Large-Scale Text Data

نویسندگان

  • Frank Lin
  • William W. Cohen
چکیده

Graph-based semi-supervised learning methods have shown to be efficient and effective on network data by propagating labels along neighboring nodes. These methods can also be applied to general data by constructing a graph where the nodes are the instances and the edges are weighted by the similarity between feature vectors of instances. However, whereas a natural network is often sparse, a network of pairwise similarities between instances is dense, and prohibitively large for even moderately sized text datasets. We show, through using a simple general technique, how these learning methods can be exactly and efficiently applied to text data—using the complete pair-wise similarity manifold—without resorting to sampling or sparsification. This technique also provides a unifying view of prior work on label propagation on text graphs, and we assess its effectiveness applied to two popular graph-based semisupervised methods on several large real datasets.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient Graph-Based Semi-Supervised Learning of Structured Tagging Models

We describe a new scalable algorithm for semi-supervised training of conditional random fields (CRF) and its application to partof-speech (POS) tagging. The algorithm uses a similarity graph to encourage similar ngrams to have similar POS tags. We demonstrate the efficacy of our approach on a domain adaptation task, where we assume that we have access to large amounts of unlabeled data from the...

متن کامل

Experiments in Graph-Based Semi-Supervised Learning Methods for Class-Instance Acquisition

Graph-based semi-supervised learning (SSL) algorithms have been successfully used to extract class-instance pairs from large unstructured and structured text collections. However, a careful comparison of different graph-based SSL algorithms on that task has been lacking. We compare three graph-based SSL algorithms for class-instance acquisition on a variety of graphs constructed from different ...

متن کامل

Large-Scale Graph-based Semi-Supervised Learning via Tree Laplacian Solver

Graph-based Semi-Supervised learning is one of the most popular and successful semi-supervised learning methods. Typically, it predicts the labels of unlabeled data by minimizing a quadratic objective induced by the graph, which is unfortunately a procedure of polynomial complexity in the sample size n. In this paper, we address this scalability issue by proposing a method that approximately so...

متن کامل

Graph-based semi-supervised learning with multi-modality propagation for large-scale image datasets

Semi-supervised learning (SSL) is widely-used to explore the vast amount of unlabeled data in the world. Over the decade, graph-based SSL becomes popular in automatic image annotation due to its power of learning globally based on local similarity. However, recent studies have shown that the emergence of large-scale datasets challenges the traditional methods. On the other hand, most previous w...

متن کامل

Incremental Spectral Sparsification for Large-Scale Graph-Based Semi-Supervised Learning

While the harmonic function solution performs well in many semi-supervised learning (SSL) tasks, it is known to scale poorly with the number of samples. Recent successful and scalable methods, such as the eigenfunction method [11] focus on efficiently approximating the whole spectrum of the graph Laplacian constructed from the data. This is in contrast to various subsampling and quantization me...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011